Building a Collation Element Table for a Large Chinese Character Set in YES
نویسندگان
چکیده
YES is a simplified stroke-based method for sorting Chinese characters. It is free from stroke counting and grouping, and thus much faster and more accurate than the traditional method. This paper presents a collation element table built in YES for a large joint Chinese character set covering (a) all 20,902 characters of Unicode CJK Unified Ideographs, (b) all 11,408 characters in the Complete List of Chinese Characters Used by the Media in 2013, (c) all 13,000 plus characters in the latest versions of Xinhua Dictionary(v11) and Contemporary Chinese Dictionary(v6). Of the 20,902 Chinese characters in Unicode, 97.23% have one-to-one relationship with their stroke order codes in YES, comparing with 90.69% of the traditional method. Enhanced with the secondary and tertiary sorting levels of stroke layout and Unicode value, there is a guarantee of one-to-one relationship between the characters and collation elements. The collation element table has been successfully applied to sorting CC-CEDICT, a Chinese-English dictionary of over 112,000 word entries.
منابع مشابه
Groups whose Bipartite Divisor Graph for Character Degrees Has Five Vertices
Let $G$ be a finite group and $cd^*(G)$ be the set of nonlinear irreducible character degrees of $G$. Suppose that $rho(G)$ denotes the set of primes dividing some element of $cd^*(G)$. The bipartite divisor graph for the set of character degrees which is denoted by $B(G)$, is a bipartite graph whose vertices are the disjoint union of $rho(G)$ and $cd^*(G)$, and a vertex $p in rho(G)$ is conne...
متن کاملGroups whose set of vanishing elements is exactly a conjugacy class
Let $G$ be a finite group. We say that an element $g$ in $G$ is a vanishing element if there exists some irreducible character $chi$ of $G$ such that $chi(g)=0$. In this paper, we classify groups whose set of vanishing elements is exactly a conjugacy class.
متن کامل基於對照表以及語言模型之簡繁字體轉換 (Chinese Characters Conversion System based on Lookup Table and Language Model) [In Chinese]
The character sets used in China and Taiwan are both Chinese, but they are divided into simplified and traditional Chinese characters. There are large amount of information exchange between China and Taiwan through books and Internet. To provide readers a convenient reading environment, the character conversion between simplified and traditional Chinese is necessary. The conversion between simp...
متن کاملBasic Elements Knowledge Acquisition Study in the Chinese Character Intelligent Formation System
In the Chinese character intelligent formation system without Chinese character library, it is possible that the same basic element in different Chinese characters is different in position, size and shape. The geometry transformation from basic elements to the components of Chinese characters can be realized by affine transformation, the transformation knowledge acquisition is the premise of Ch...
متن کاملTorsion Analysis of High-Rise Buildings using Quadrilateral Panel Elements with Drilling D.O.F.s
Generally, the finite element method is a powerful procedure for analysis of tall buildings. Yet, it should be noted that there are some problems in the application of many finite elements to the analysis of tall building structures. The presence of artificial flexure and parasitic shear effects in many lower order plane stress and membrane elements, cause the numerical procedure to converge in...
متن کامل